{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Notebook to demonstrate Data-Services workflow\n",
    "### The workflow in a nutshell\n",
    "TAO Data Services include 4 key pipelines:\n",
    "1. Offline data augmentation using DALI\n",
    "2. Auto labeling using TAO Mask Auto-labeler (MAL)\n",
    "3. Annotation conversion\n",
    "4. Groundtruth analytics\n",
    "\n",
    "## Learning Objectives\n",
    "\n",
    "In this notebook, you will learn how to leverage the simplicity and convenience of TAO to:\n",
    "\n",
    "* Convert KITTI dataset to COCO format\n",
    "* Run auto-labeling to generate pseudo masks for KITTI bounding boxes\n",
    "* Apply data augmentation to the KITTI dataset with bounding boxe refinement\n",
    "* Run data analytics to collect useful statistics on the original and augmented KITTI dataset\n",
    "\n",
    "### Table of contents\n",
    "\n",
    "1. [Convert KITTI data to COCO format](#head-1)\n",
    "2. [Generate pseudo-masks with the auto-labeler](#head-2)\n",
    "3. [Apply data augmentation](#head-3)\n",
    "4. [Perform data analytics](#head-4)\n",
    "\n",
    "\n",
    "### Requirements\n",
    "Please find the server requirements [here](https://docs.nvidia.com/tao/tao-toolkit/text/tao_toolkit_api/api_setup.html#)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "scrolled": true,
    "tags": []
   },
   "outputs": [],
   "source": [
    "# imports\n",
    "import json\n",
    "import os\n",
    "import requests\n",
    "import time\n",
    "from IPython.display import clear_output"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### FIXME\n",
    "\n",
    "1. Assign a workdir in FIXME 1\n",
    "1. Assign the ip_address and port_number in FIXME 2 ([info](https://docs.nvidia.com/tao/tao-toolkit/text/tao_toolkit_api/api_rest_api.html))\n",
    "1. Assign the ngc_api_key variable in FIXME 3\n",
    "1. Assign path to save the dataset in FIXME 4"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "scrolled": true,
    "tags": []
   },
   "outputs": [],
   "source": [
    "workdir = \"workdir_data_services\"  # FIXME 1\n",
    "host_url = \"http://<ip_address>:<port_number>\"  # FIXME 2 example: https://10.137.149.22:32334\n",
    "# In host machine, node ip_address and port_number can be obtained as follows,\n",
    "# ip_address: hostname -I\n",
    "# port_number: kubectl get service ingress-nginx-controller -o jsonpath='{.spec.ports[0].nodePort}'\n",
    "ngc_api_key = \"<ngc_api_key>\"  # FIXME 3 example: (Add NGC API key) "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "data_dir = \"dataset_path\"    # FIXME 4\n",
    "job_map = {}"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "scrolled": true,
    "tags": []
   },
   "outputs": [],
   "source": [
    "# Exchange NGC_API_KEY for JWT\n",
    "data = json.dumps({\"ngc_api_key\": ngc_api_key})\n",
    "response = requests.post(f\"{host_url}/api/v1/login\", data=data)\n",
    "user_id = response.json()[\"user_id\"]\n",
    "print(\"User ID\", user_id)\n",
    "token = response.json()[\"token\"]\n",
    "print(\"JWT\", token)\n",
    "\n",
    "# Set base URL\n",
    "base_url = f\"{host_url}/api/v1/users/{user_id}\"\n",
    "print(\"API Calls will be forwarded to\", base_url)\n",
    "\n",
    "headers = {\"Authorization\": f\"Bearer {token}\"}"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "scrolled": true,
    "tags": []
   },
   "outputs": [],
   "source": [
    "# Creating workdir\n",
    "if not os.path.isdir(workdir):\n",
    "    os.makedirs(workdir)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Function to split tar files <a class=\"anchor\" id=\"head-1.1\"></a>"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "import os\n",
    "import tarfile\n",
    "\n",
    "def split_tar_file(input_tar_path, output_dir, max_split_size=0.2*1024*1024*1024):\n",
    "\tos.makedirs(output_dir, exist_ok=True)\n",
    "\t\n",
    "\twith tarfile.open(input_tar_path, 'r') as original_tar:\n",
    "\t\tmembers = original_tar.getmembers()\n",
    "\t\tcurrent_split_size = 0\n",
    "\t\tcurrent_split_number = 0\n",
    "\t\tcurrent_split_name = os.path.join(output_dir, f'smaller_file_{current_split_number}.tar')\n",
    "\t\t\n",
    "\t\twith tarfile.open(current_split_name, 'w') as split_tar:\n",
    "\t\t\tfor member in members:\n",
    "\t\t\t\tif current_split_size + member.size <= max_split_size:\n",
    "\t\t\t\t\tsplit_tar.addfile(member, original_tar.extractfile(member))\n",
    "\t\t\t\t\tcurrent_split_size += member.size\n",
    "\t\t\t\telse:\n",
    "\t\t\t\t\tsplit_tar.close()\n",
    "\t\t\t\t\tcurrent_split_number += 1\n",
    "\t\t\t\t\tcurrent_split_name = os.path.join(output_dir, f'smaller_file_{current_split_number}.tar')\n",
    "\t\t\t\t\tcurrent_split_size = 0\n",
    "\t\t\t\t\tsplit_tar = tarfile.open(current_split_name, 'w')  # Open a new split tar archive\n",
    "\t\t\t\t\tsplit_tar.addfile(member, original_tar.extractfile(member))\n",
    "\t\t\t\t\tcurrent_split_size += member.size"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 1. Convert KITTI data to COCO format <a class=\"anchor\" id=\"head-1\"></a>\n",
    "We would first convert the dataset from KITTI to COCO formats.\n",
    "\n",
    "### Create the dataset\n",
    "We support both KITTI and COCO data formats\n",
    "\n",
    "KITTI dataset follow the directory structure displayed below:\n",
    "```\n",
    "$DATA_DIR/dataset\n",
    "├── images\n",
    "│   ├── image_name_1.jpg\n",
    "│   ├── image_name_2.jpg\n",
    "|   ├── ...\n",
    "└── labels\n",
    "    ├── image_name_1.txt\n",
    "    ├── image_name_2.txt\n",
    "    ├── ...\n",
    "```\n",
    "\n",
    "And COCO dataset follow the directory structure displayed below:\n",
    "```\n",
    "$DATA_DIR/dataset\n",
    "├── images\n",
    "│   ├── image_name_1.jpg\n",
    "│   ├── image_name_2.jpg\n",
    "|   ├── ...\n",
    "└── annotations.json\n",
    "```\n",
    "For this notebook, we will be using the kitti object detection dataset for this example. To find more details, please visit [here](http://www.cvlibs.net/datasets/kitti/eval_object.php?obj_benchmark=2d)."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Assigning the task and action"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Defining the task\n",
    "network_arch = \"annotations\"\n",
    "action = \"convert\"\n",
    "\n",
    "data = json.dumps({\"network_arch\": network_arch})\n",
    "\n",
    "endpoint = f\"{base_url}/experiments\"\n",
    "\n",
    "\n",
    "response = requests.post(endpoint, data=data, headers=headers)\n",
    "print(response.json())\n",
    "\n",
    "annotation_conversion_experiment_id = response.json()[\"id\"]\n",
    "print(annotation_conversion_experiment_id)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# List tasks\n",
    "endpoint = f\"{base_url}/experiments\"\n",
    "\n",
    "response = requests.get(endpoint, headers=headers)\n",
    "\n",
    "print(response)\n",
    "print(\"model id\\t\\t\\t     network architecture\")\n",
    "for rsp in response.json():\n",
    "    print(rsp[\"id\"],rsp[\"network_arch\"])"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "scrolled": true,
    "tags": []
   },
   "outputs": [],
   "source": [
    "# Dataset Links\n",
    "images_url = \"https://s3.eu-central-1.amazonaws.com/avg-kitti/data_object_image_2.zip\"\n",
    "labels_url = \"https://s3.eu-central-1.amazonaws.com/avg-kitti/data_object_label_2.zip\""
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "scrolled": true,
    "tags": []
   },
   "outputs": [],
   "source": [
    "# Download the dataset\n",
    "!wget -O images.zip {images_url}\n",
    "!wget -O labels.zip {labels_url}"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "scrolled": true,
    "tags": []
   },
   "outputs": [],
   "source": [
    "!unzip -q images.zip -d {data_dir}/\n",
    "!unzip -q labels.zip -d {data_dir}/\n",
    "!mkdir -p {data_dir}/images {data_dir}/labels\n",
    "!mv {data_dir}/training/image_2/000* {data_dir}/images/\n",
    "!mv {data_dir}/training/label_2/000* {data_dir}/labels/\n",
    "!cd {data_dir} && tar -cf kitti_dataset.tar images labels\n",
    "!rm -rf images.zip labels.zip {data_dir}/training/ {data_dir}/training/ {data_dir}/testing/"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "scrolled": true,
    "tags": []
   },
   "outputs": [],
   "source": [
    "# Create Dataset\n",
    "ds_type = \"object_detection\"\n",
    "ds_format = \"kitti\"\n",
    "data = json.dumps({\"type\": ds_type, \"format\": ds_format})\n",
    "\n",
    "endpoint = f\"{base_url}/datasets\"\n",
    "\n",
    "response = requests.post(endpoint, data=data, headers=headers)\n",
    "\n",
    "print(response)\n",
    "print(response.json())\n",
    "kitti_dataset_id = response.json()[\"id\"]"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "scrolled": true,
    "tags": []
   },
   "outputs": [],
   "source": [
    "# Update\n",
    "dataset_information = {\"name\": \"Dataset\",\n",
    "                       \"description\": \"My dataset\"}\n",
    "data = json.dumps(dataset_information)\n",
    "\n",
    "endpoint = f\"{base_url}/datasets/{kitti_dataset_id}\"\n",
    "\n",
    "response = requests.patch(endpoint, data=data, headers=headers)\n",
    "\n",
    "print(response)\n",
    "print(response.json())"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "scrolled": true,
    "tags": []
   },
   "outputs": [],
   "source": [
    "# Upload\n",
    "dataset_path = f\"{data_dir}/kitti_dataset.tar\"\n",
    "output_dir = os.path.join(os.path.dirname(os.path.abspath(dataset_path)), network_arch, \"test\")\n",
    "split_tar_file(dataset_path, output_dir)\n",
    "for idx, tar_dataset_path in enumerate(os.listdir(output_dir)):\n",
    "    print(f\"Uploading {idx+1}/{len(os.listdir(output_dir))} tar split\")\n",
    "    files = [(\"file\", open(os.path.join(output_dir, tar_dataset_path),\"rb\"))]\n",
    "\n",
    "    endpoint = f\"{base_url}/datasets/{kitti_dataset_id}:upload\"\n",
    "\n",
    "    response = requests.post(endpoint, files=files, headers=headers)\n",
    "\n",
    "    print(response)\n",
    "    print(response.json())"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### List the created datasets"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "scrolled": true,
    "tags": []
   },
   "outputs": [],
   "source": [
    "# List the created datasets\n",
    "endpoint = f\"{base_url}/datasets\"\n",
    "\n",
    "response = requests.get(endpoint, headers=headers)\n",
    "\n",
    "print(response)\n",
    "print(\"id\\t\\t\\t\\t\\t type\\t\\t\\t format\\t\\t name\")\n",
    "for rsp in response.json():\n",
    "    print(f\"{rsp['id']}\\t{rsp['type']}\\t{rsp['format']}\\t\\t{rsp['name']}\")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Assign the dataset\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "scrolled": true,
    "tags": []
   },
   "outputs": [],
   "source": [
    "# Assign Dataset\n",
    "dataset_information = {\"inference_dataset\": kitti_dataset_id}\n",
    "\n",
    "data = json.dumps(dataset_information)\n",
    "\n",
    "endpoint = f\"{base_url}/experiments/{annotation_conversion_experiment_id}\"\n",
    "\n",
    "response = requests.patch(endpoint, data=data, headers=headers)\n",
    "\n",
    "print(response)\n",
    "print(response.json())"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Define the specs\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "scrolled": true,
    "tags": []
   },
   "outputs": [],
   "source": [
    "# Get default spec schema\n",
    "endpoint = f\"{base_url}/experiments/{annotation_conversion_experiment_id}/specs/{action}/schema\"\n",
    " \n",
    "response = requests.get(endpoint, headers=headers)\n",
    "\n",
    "print(response)\n",
    "annotations_conversion_specs = response.json()[\"default\"]\n",
    "print(json.dumps(annotations_conversion_specs, sort_keys=True, indent=4))"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "scrolled": true,
    "tags": []
   },
   "outputs": [],
   "source": [
    "# Updating spec file\n",
    "annotations_conversion_specs[\"data\"][\"input_format\"] = \"KITTI\"\n",
    "annotations_conversion_specs[\"data\"][\"output_format\"] = \"COCO\"\n",
    "print(json.dumps(annotations_conversion_specs, sort_keys=True, indent=4))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Execute the data format conversion action \n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "scrolled": true,
    "tags": []
   },
   "outputs": [],
   "source": [
    "# Run action\n",
    "parent = None\n",
    "data = json.dumps({\"parent_job_id\":parent, \"action\":action, \"specs\":annotations_conversion_specs})\n",
    "\n",
    "endpoint = f\"{base_url}/experiments/{annotation_conversion_experiment_id}/jobs\"\n",
    "\n",
    "response = requests.post(endpoint, data=data, headers=headers)\n",
    "\n",
    "print(response)\n",
    "print(response.json())\n",
    "\n",
    "job_map[action] = response.json()\n",
    "print(job_map)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "scrolled": true,
    "tags": []
   },
   "outputs": [],
   "source": [
    "# Monitor job status by repeatedly running this cell\n",
    "convert_job_id = job_map[action]\n",
    "endpoint = f\"{base_url}/experiments/{annotation_conversion_experiment_id}/jobs/{convert_job_id}\"\n",
    "\n",
    "while True:\n",
    "    clear_output(wait=True)\n",
    "    response = requests.get(endpoint, headers=headers)\n",
    "    \n",
    "    if \"error_desc\" in response.json().keys() and response.json()[\"error_desc\"] in (\"Job trying to retrieve not found\", \"No AutoML run found\"):\n",
    "        print(\"Job is being created\")\n",
    "        time.sleep(5)\n",
    "        continue\n",
    "    print(response)\n",
    "    print(json.dumps(response.json(), sort_keys=True, indent=4))\n",
    "       \n",
    "    if response.json().get(\"status\") in [\"Done\",\"Error\", \"Canceled\"] or response.status_code not in (200,201):\n",
    "        break\n",
    "    time.sleep(15)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "scrolled": true,
    "tags": []
   },
   "outputs": [],
   "source": [
    "# Download job contents\n",
    "convert_job_id = job_map[action]\n",
    "endpoint = f\"{base_url}/experiments/{annotation_conversion_experiment_id}/jobs/{convert_job_id}\"\n",
    "response = requests.get(endpoint, headers=headers)\n",
    "assert response.status_code in (200, 201)\n",
    "expected_file_size = response.json().get(\"job_tar_stats\", {}).get(\"file_size\")\n",
    "print(\"expected_file_size: \", expected_file_size)\n",
    "\n",
    "endpoint = f'{base_url}/experiments/{annotation_conversion_experiment_id}/jobs/{convert_job_id}:download'\n",
    "temptar = f'{convert_job_id}.tar.gz'\n",
    "\n",
    "while True:\n",
    "    # Check if the file already exists\n",
    "    headers_download_job = dict(headers)\n",
    "    if os.path.exists(temptar):\n",
    "        # Get the current file size\n",
    "        file_size = os.path.getsize(temptar)\n",
    "\n",
    "        # If the file size matches the expected size, break out of the loop\n",
    "        if file_size >= expected_file_size:\n",
    "            print(\"Download completed successfully.\")\n",
    "            print(\"Untarring\")\n",
    "            # Untar to destination\n",
    "            tar_command = f'tar -xf {temptar} -C {workdir}/'\n",
    "            os.system(tar_command)\n",
    "            os.remove(temptar)\n",
    "            print(f\"Results at {workdir}/{convert_job_id}\")\n",
    "            convert_out_path = f\"{workdir}/{convert_job_id}\"\n",
    "            break\n",
    "\n",
    "        # Set the headers to resume the download from where it left off\n",
    "        headers_download_job['Range'] = f'bytes={file_size}-'\n",
    "    # Open the file for writing in binary mode\n",
    "    with open(temptar, 'ab') as f:\n",
    "        try:\n",
    "            response = requests.get(endpoint, headers=headers_download_job, stream=True)\n",
    "            # Check if the request was successful\n",
    "            if response.status_code in [200, 206]:\n",
    "                # Iterate over the content in chunks\n",
    "                for chunk in response.iter_content(chunk_size=1024):\n",
    "                    if chunk:\n",
    "                        # Write the chunk to the file\n",
    "                        f.write(chunk)\n",
    "                        # Flush and sync the file to disk\n",
    "                        f.flush()\n",
    "                        os.fsync(f.fileno())\n",
    "            else:\n",
    "                print(f\"Failed to download file. Status code: {response.status_code}\")\n",
    "        except requests.exceptions.RequestException as e:\n",
    "            print(\"Connection interrupted during download, resuming download from breaking point\")\n",
    "            time.sleep(5)  # Sleep for a while before retrying the request\n",
    "            continue  # Continue the loop to retry the request\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 2. Generate pseudo-masks with the auto-labeler <a class=\"anchor\" id=\"head-2\"></a>\n",
    "Here we will use a pretrained MAL model to generate pseudo-masks for the converted KITTI data. "
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Create the model"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "scrolled": true,
    "tags": []
   },
   "outputs": [],
   "source": [
    "# Defining the task\n",
    "network_arch = \"auto_label\"\n",
    "action = \"generate\"\n",
    "\n",
    "data = json.dumps({\"network_arch\": network_arch})\n",
    "print(data)\n",
    "endpoint = f\"{base_url}/experiments\"\n",
    "\n",
    "response = requests.post(endpoint, data=data, headers=headers)\n",
    "\n",
    "print(response)\n",
    "print(response.json())\n",
    "pseudo_mask_experiment_id = response.json()[\"id\"]\n",
    "print(pseudo_mask_experiment_id)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "scrolled": true,
    "tags": []
   },
   "outputs": [],
   "source": [
    "# Reformatting the dataset\n",
    "# Untar to destination\n",
    "tar_command = f'mkdir -p {workdir}/{convert_job_id}_coco/ && tar -xf {dataset_path} -C {workdir}/{convert_job_id}_coco/'\n",
    "os.system(tar_command)\n",
    "\n",
    "# Copy the annotations\n",
    "copy_command = f'cp {convert_out_path}/{kitti_dataset_id}.json {workdir}/{convert_job_id}_coco/annotations.json'\n",
    "os.system(copy_command)\n",
    "\n",
    "# Tar the dataset\n",
    "tar_command = f'cd {workdir} && tar -cf {convert_job_id}_coco.tar {convert_job_id}_coco'\n",
    "os.system(tar_command)\n",
    "coco_data = f'{workdir}/{convert_job_id}_coco.tar'"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Create the dataset\n",
    "We would be formatting the original dataset to include the COCO annotations generated."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "scrolled": true,
    "tags": []
   },
   "outputs": [],
   "source": [
    "# Create Dataset\n",
    "ds_type = \"object_detection\"\n",
    "ds_format = \"coco\"\n",
    "data = json.dumps({\"type\": ds_type,\"format\": ds_format})\n",
    "\n",
    "endpoint = f\"{base_url}/datasets\"\n",
    "\n",
    "response = requests.post(endpoint, data=data, headers=headers)\n",
    "\n",
    "print(response)\n",
    "print(response.json())\n",
    "coco_dataset_id = response.json()[\"id\"]"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "scrolled": true,
    "tags": []
   },
   "outputs": [],
   "source": [
    "# Update dataset information\n",
    "dataset_information = {\"name\": \"Dataset\",\n",
    "                       \"description\": \"My dataset\"}\n",
    "data = json.dumps(dataset_information)\n",
    "\n",
    "endpoint = f\"{base_url}/datasets/{coco_dataset_id}\"\n",
    "\n",
    "response = requests.patch(endpoint, data=data, headers=headers)\n",
    "\n",
    "print(response)\n",
    "print(response.json())"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "scrolled": true,
    "tags": []
   },
   "outputs": [],
   "source": [
    "# Upload\n",
    "output_dir = os.path.join(os.path.dirname(os.path.abspath(coco_data)), network_arch, \"test\")\n",
    "split_tar_file(coco_data, output_dir)\n",
    "for idx, tar_dataset_path in enumerate(os.listdir(output_dir)):\n",
    "    print(f\"Uploading {idx+1}/{len(os.listdir(output_dir))} tar split\")\n",
    "    files = [(\"file\",open(os.path.join(output_dir, tar_dataset_path), \"rb\"))]\n",
    "\n",
    "    endpoint = f\"{base_url}/datasets/{coco_dataset_id}:upload\"\n",
    "\n",
    "    response = requests.post(endpoint, files=files, headers=headers)\n",
    "\n",
    "    print(response)\n",
    "    print(response.json())"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Find the PTM"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "scrolled": true,
    "tags": []
   },
   "outputs": [],
   "source": [
    "# List models\n",
    "endpoint = f\"{base_url}/experiments\"\n",
    "\n",
    "response = requests.get(endpoint, headers=headers)\n",
    "\n",
    "print(response)\n",
    "print(\"model id\\t\\t\\t     network architecture\")\n",
    "for rsp in response.json():\n",
    "    if rsp[\"name\"] == \"Mask Auto Label\":\n",
    "        print(f'PTM Name: {rsp[\"name\"]}; PTM version: {rsp[\"version\"]}; NGC PATH: {rsp[\"ngc_path\"]}; Additional info: {rsp[\"additional_id_info\"]}')"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "scrolled": true,
    "tags": []
   },
   "outputs": [],
   "source": [
    "pretrained_map = {\"auto_label\" : \"mask_auto_label:trainable_v1.0\"}"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "scrolled": true,
    "tags": []
   },
   "outputs": [],
   "source": [
    "# Get pretrained model\n",
    "model_list = f\"{base_url}/experiments\"\n",
    "response = requests.get(model_list, headers=headers)\n",
    "\n",
    "response_json = response.json()\n",
    "\n",
    "# Search for ptm with given ngc path\n",
    "ptm = []\n",
    "for rsp in response_json:\n",
    "    if rsp[\"network_arch\"] == network_arch and rsp[\"ngc_path\"].endswith(pretrained_map[network_arch]):\n",
    "        ptm_id = rsp[\"id\"]\n",
    "        ptm = [ptm_id]\n",
    "        print(\"Metadata for model with requested NGC Path\")\n",
    "        print(rsp)\n",
    "        break"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Assign the dataset"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "scrolled": true,
    "tags": []
   },
   "outputs": [],
   "source": [
    "# Assign Dataset\n",
    "dataset_information = {\"inference_dataset\": coco_dataset_id}\n",
    "\n",
    "data = json.dumps(dataset_information)\n",
    "\n",
    "endpoint = f\"{base_url}/experiments/{pseudo_mask_experiment_id}\"\n",
    "\n",
    "response = requests.patch(endpoint, data=data, headers=headers)\n",
    "\n",
    "print(response)\n",
    "print(response.json())"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Assign the PTM"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "scrolled": true,
    "tags": []
   },
   "outputs": [],
   "source": [
    "# Assign PTM\n",
    "dataset_information = {\"base_experiment\": ptm}\n",
    "\n",
    "data = json.dumps(dataset_information)\n",
    "\n",
    "endpoint = f\"{base_url}/experiments/{pseudo_mask_experiment_id}\"\n",
    "\n",
    "response = requests.patch(endpoint, data=data, headers=headers)\n",
    "\n",
    "print(response)\n",
    "print(response.json())"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Define the specs\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "scrolled": true,
    "tags": []
   },
   "outputs": [],
   "source": [
    "# Get default spec schema\n",
    "endpoint = f\"{base_url}/experiments/{pseudo_mask_experiment_id}/specs/{action}/schema\"\n",
    "\n",
    "response = requests.get(endpoint, headers=headers)\n",
    "\n",
    "print(response)\n",
    "auto_label_generate_specs = response.json()[\"default\"]\n",
    "print(json.dumps(auto_label_generate_specs, sort_keys=True, indent=4))"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "scrolled": true
   },
   "outputs": [],
   "source": [
    "# Override any of the parameters listed in the previous cell as required\n",
    "auto_label_generate_specs[\"gpu_ids\"] = [0]\n",
    "print(json.dumps(auto_label_generate_specs, sort_keys=True, indent=4))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Execute the auto labeling action"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "scrolled": true,
    "tags": []
   },
   "outputs": [],
   "source": [
    "# Run action\n",
    "parent = None\n",
    "\n",
    "data = json.dumps({\"parent_job_id\": parent, \"action\":action, \"specs\":auto_label_generate_specs})\n",
    "\n",
    "endpoint = f\"{base_url}/experiments/{pseudo_mask_experiment_id}/jobs\"\n",
    "\n",
    "response = requests.post(endpoint, data=data, headers=headers)\n",
    "\n",
    "print(response)\n",
    "print(response.json())\n",
    "\n",
    "job_map[action] = response.json()\n",
    "print(job_map)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "scrolled": true,
    "tags": []
   },
   "outputs": [],
   "source": [
    "# Monitor job status by repeatedly running this cell\n",
    "label_job_id = job_map[action]\n",
    "endpoint = f\"{base_url}/experiments/{pseudo_mask_experiment_id}/jobs/{label_job_id}\"\n",
    "\n",
    "while True: \n",
    "    clear_output(wait=True)\n",
    "    response = requests.get(endpoint, headers=headers)\n",
    "    \n",
    "    if \"error_desc\" in response.json().keys() and response.json()[\"error_desc\"] in (\"Job trying to retrieve not found\", \"No AutoML run found\"):\n",
    "        print(\"Job is being created\")\n",
    "        time.sleep(5)\n",
    "        continue\n",
    "    print(response)\n",
    "    print(json.dumps(response.json(), sort_keys=True, indent=4))\n",
    "\n",
    "    if response.json().get(\"status\") in [\"Done\",\"Error\", \"Canceled\"] or response.status_code not in (200,201):\n",
    "        break\n",
    "    time.sleep(15)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "scrolled": true,
    "tags": []
   },
   "outputs": [],
   "source": [
    "# Download job contents\n",
    "label_job_id = job_map[action]\n",
    "\n",
    "endpoint = f\"{base_url}/experiments/{pseudo_mask_experiment_id}/jobs/{label_job_id}\"\n",
    "response = requests.get(endpoint, headers=headers)\n",
    "assert response.status_code in (200, 201)\n",
    "expected_file_size = response.json().get(\"job_tar_stats\", {}).get(\"file_size\")\n",
    "print(\"expected_file_size: \", expected_file_size)\n",
    "\n",
    "endpoint = f'{base_url}/experiments/{pseudo_mask_experiment_id}/jobs/{label_job_id}:download'\n",
    "# Save\n",
    "temptar = f'{label_job_id}.tar.gz'\n",
    "\n",
    "while True:\n",
    "    # Check if the file already exists\n",
    "    headers_download_job = dict(headers)\n",
    "    if os.path.exists(temptar):\n",
    "        # Get the current file size\n",
    "        file_size = os.path.getsize(temptar)\n",
    "\n",
    "        # If the file size matches the expected size, break out of the loop\n",
    "        if file_size >= expected_file_size:\n",
    "            print(\"Download completed successfully.\")\n",
    "            print(\"Untarring\")\n",
    "            # Untar to destination\n",
    "            tar_command = f'tar -xf {temptar} -C {workdir}/'\n",
    "            os.system(tar_command)\n",
    "            os.remove(temptar)\n",
    "            print(f\"Results at {workdir}/{label_job_id}\")\n",
    "            inference_out_path = f\"{workdir}/{label_job_id}\"\n",
    "            break\n",
    "\n",
    "        # Set the headers to resume the download from where it left off\n",
    "        headers_download_job['Range'] = f'bytes={file_size}-'\n",
    "    # Open the file for writing in binary mode\n",
    "    with open(temptar, 'ab') as f:\n",
    "        try:\n",
    "            response = requests.get(endpoint, headers=headers_download_job, stream=True)\n",
    "            # Check if the request was successful\n",
    "            if response.status_code in [200, 206]:\n",
    "                # Iterate over the content in chunks\n",
    "                for chunk in response.iter_content(chunk_size=1024):\n",
    "                    if chunk:\n",
    "                        # Write the chunk to the file\n",
    "                        f.write(chunk)\n",
    "                        # Flush and sync the file to disk\n",
    "                        f.flush()\n",
    "                        os.fsync(f.fileno())\n",
    "            else:\n",
    "                print(f\"Failed to download file. Status code: {response.status_code}\")\n",
    "        except requests.exceptions.RequestException as e:\n",
    "            print(\"Connection interrupted during download, resuming download from breaking point\")\n",
    "            time.sleep(5)  # Sleep for a while before retrying the request\n",
    "            continue  # Continue the loop to retry the request"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 3. Apply data augmentation <a class=\"anchor\" id=\"head-3\"></a>\n",
    "In this section, we run offline augmentation with the original dataset. During the augmentation process, we can use the pseudo-masks generated from the last step to refine the distorted or rotated bounding boxes"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Assigning the task and action"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Defining the task\n",
    "network_arch = \"augmentation\"\n",
    "action = \"generate\"\n",
    "\n",
    "data = json.dumps({\"network_arch\": network_arch})\n",
    "\n",
    "endpoint = f\"{base_url}/experiments\"\n",
    "\n",
    "response = requests.post(endpoint, data=data, headers=headers)\n",
    "\n",
    "data_aug_experiment_id = response.json()[\"id\"]\n",
    "print(data_aug_experiment_id)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# List tasks\n",
    "endpoint = f\"{base_url}/experiments\"\n",
    "\n",
    "response = requests.get(endpoint, headers=headers)\n",
    "\n",
    "print(response)\n",
    "print(\"model id\\t\\t\\t     network architecture\")\n",
    "for rsp in response.json():\n",
    "    print(rsp[\"id\"],rsp[\"network_arch\"])"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Create the dataset\n",
    "We would be formatting the dataset to include the generated mask information."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "scrolled": true,
    "tags": []
   },
   "outputs": [],
   "source": [
    "# Format the dataset\n",
    "copy_command = f'cp {workdir}/{label_job_id}/label.json {workdir}/{convert_job_id}_coco'\n",
    "os.system(copy_command)\n",
    "\n",
    "# Tar the dataset\n",
    "tar_command = f'cd {workdir} && tar -cvf {label_job_id}_coco.tar {convert_job_id}_coco'\n",
    "os.system(tar_command)\n",
    "coco_data = f'{workdir}/{label_job_id}_coco.tar'"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "scrolled": true,
    "tags": []
   },
   "outputs": [],
   "source": [
    "# Create Dataset\n",
    "ds_type = \"object_detection\"\n",
    "ds_format = \"coco\"\n",
    "data = json.dumps({\"type\": ds_type,\"format\": ds_format})\n",
    "\n",
    "endpoint = f\"{base_url}/datasets\"\n",
    "\n",
    "response = requests.post(endpoint, data=data, headers=headers)\n",
    "\n",
    "print(response)\n",
    "print(response.json())\n",
    "coco_dataset_augment_id = response.json()[\"id\"]"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "scrolled": true,
    "tags": []
   },
   "outputs": [],
   "source": [
    "# Update\n",
    "dataset_information = {\"name\": \"Dataset\",\n",
    "                       \"description\": \"My dataset\"}\n",
    "data = json.dumps(dataset_information)\n",
    "\n",
    "endpoint = f\"{base_url}/datasets/{coco_dataset_augment_id}\"\n",
    "\n",
    "response = requests.patch(endpoint, data=data, headers=headers)\n",
    "\n",
    "print(response)\n",
    "print(response.json())"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "scrolled": true,
    "tags": []
   },
   "outputs": [],
   "source": [
    "# Upload\n",
    "output_dir = os.path.join(os.path.dirname(os.path.abspath(coco_data)), network_arch, \"test\")\n",
    "split_tar_file(coco_data, output_dir)\n",
    "for idx, tar_dataset_path in enumerate(os.listdir(output_dir)):\n",
    "    print(f\"Uploading {idx+1}/{len(os.listdir(output_dir))} tar split\")\n",
    "    files = [(\"file\",open(os.path.join(output_dir, tar_dataset_path), \"rb\"))]\n",
    "\n",
    "    endpoint = f\"{base_url}/datasets/{coco_dataset_augment_id}:upload\"\n",
    "\n",
    "    response = requests.post(endpoint, files=files, headers=headers)\n",
    "\n",
    "    print(response)\n",
    "    print(response.json())"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Assign the dataset\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "scrolled": true,
    "tags": []
   },
   "outputs": [],
   "source": [
    "# Assign Dataset\n",
    "dataset_information = {\"inference_dataset\": coco_dataset_augment_id}\n",
    "\n",
    "data = json.dumps(dataset_information)\n",
    "\n",
    "endpoint = f\"{base_url}/experiments/{data_aug_experiment_id}\"\n",
    "\n",
    "response = requests.patch(endpoint, data=data, headers=headers)\n",
    "\n",
    "print(response)\n",
    "print(response.json())"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Define the specs\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "scrolled": true,
    "tags": []
   },
   "outputs": [],
   "source": [
    "# Get default spec schema\n",
    "endpoint = f\"{base_url}/experiments/{data_aug_experiment_id}/specs/{action}/schema\"\n",
    "\n",
    "response = requests.get(endpoint, headers=headers)\n",
    "\n",
    "print(response)\n",
    "augmentation_generate_specs = response.json()[\"default\"]\n",
    "print(json.dumps(augmentation_generate_specs, sort_keys=True, indent=4))"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Make changes to the specs if necessary\n",
    "print(json.dumps(augmentation_generate_specs, sort_keys=True, indent=4))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Execute the data augmentation action\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "scrolled": true,
    "tags": []
   },
   "outputs": [],
   "source": [
    "# Run action\n",
    "parent = None\n",
    "\n",
    "data = json.dumps({\"parent_job_id\":parent, \"action\":action, \"specs\":augmentation_generate_specs})\n",
    "\n",
    "endpoint = f\"{base_url}/experiments/{data_aug_experiment_id}/jobs\"\n",
    "\n",
    "response = requests.post(endpoint, data=data, headers=headers)\n",
    "\n",
    "print(response)\n",
    "print(response.json())\n",
    "\n",
    "job_map[action] = response.json()\n",
    "print(job_map)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "scrolled": true,
    "tags": []
   },
   "outputs": [],
   "source": [
    "# Monitor job status by repeatedly running this cell\n",
    "augment_job_id = job_map[action]\n",
    "endpoint = f\"{base_url}/experiments/{data_aug_experiment_id}/jobs/{augment_job_id}\"\n",
    "\n",
    "while True:\n",
    "    clear_output(wait=True)\n",
    "    response = requests.get(endpoint, headers=headers)\n",
    "    \n",
    "    if \"error_desc\" in response.json().keys() and response.json()[\"error_desc\"] in (\"Job trying to retrieve not found\", \"No AutoML run found\"):\n",
    "        print(\"Job is being created\")\n",
    "        time.sleep(5)\n",
    "        continue\n",
    "    print(response)\n",
    "    print(json.dumps(response.json(), sort_keys=True, indent=4))\n",
    "\n",
    "    if response.json().get(\"status\") in [\"Done\",\"Error\", \"Canceled\"] or response.status_code not in (200,201):\n",
    "        break\n",
    "    time.sleep(15)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "scrolled": true,
    "tags": []
   },
   "outputs": [],
   "source": [
    "# Download job contents\n",
    "augment_job_id = job_map[action]\n",
    "temptar = f'{augment_job_id}.tar.gz'\n",
    "\n",
    "endpoint = f\"{base_url}/experiments/{data_aug_experiment_id}/jobs/{augment_job_id}\"\n",
    "response = requests.get(endpoint, headers=headers)\n",
    "assert response.status_code in (200, 201)\n",
    "expected_file_size = response.json().get(\"job_tar_stats\", {}).get(\"file_size\")\n",
    "print(\"expected_file_size: \", expected_file_size)\n",
    "\n",
    "endpoint = f'{base_url}/experiments/{data_aug_experiment_id}/jobs/{augment_job_id}:download'\n",
    "\n",
    "while True:\n",
    "    # Check if the file already exists\n",
    "    headers_download_job = dict(headers)\n",
    "    if os.path.exists(temptar):\n",
    "        # Get the current file size\n",
    "        file_size = os.path.getsize(temptar)\n",
    "\n",
    "        # If the file size matches the expected size, break out of the loop\n",
    "        if file_size >= expected_file_size:\n",
    "            print(\"Download completed successfully.\")\n",
    "            print(\"Untarring\")\n",
    "            # Untar to destination\n",
    "            tar_command = f'tar -xf {temptar} -C {workdir}/'\n",
    "            os.system(tar_command)\n",
    "            os.remove(temptar)\n",
    "            print(f\"Results at {workdir}/{augment_job_id}\")\n",
    "            inference_out_path = f\"{workdir}/{augment_job_id}\"\n",
    "            break\n",
    "\n",
    "        # Set the headers to resume the download from where it left off\n",
    "        headers_download_job['Range'] = f'bytes={file_size}-'\n",
    "    # Open the file for writing in binary mode\n",
    "    with open(temptar, 'ab') as f:\n",
    "        try:\n",
    "            response = requests.get(endpoint, headers=headers_download_job, stream=True)\n",
    "            # Check if the request was successful\n",
    "            if response.status_code in [200, 206]:\n",
    "                # Iterate over the content in chunks\n",
    "                for chunk in response.iter_content(chunk_size=1024):\n",
    "                    if chunk:\n",
    "                        # Write the chunk to the file\n",
    "                        f.write(chunk)\n",
    "                        # Flush and sync the file to disk\n",
    "                        f.flush()\n",
    "                        os.fsync(f.fileno())\n",
    "            else:\n",
    "                print(f\"Failed to download file. Status code: {response.status_code}\")\n",
    "        except requests.exceptions.RequestException as e:\n",
    "            print(\"Connection interrupted during download, resuming download from breaking point\")\n",
    "            time.sleep(5)  # Sleep for a while before retrying the request\n",
    "            continue  # Continue the loop to retry the request"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 4. Perform data analytics  <a class=\"anchor\" id=\"head-4\"></a>\n",
    "Next, we perform analytics with the KITTI dataset."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Assigning the task and action"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "scrolled": true,
    "tags": []
   },
   "outputs": [],
   "source": [
    "# Defining the task\n",
    "network_arch = \"analytics\"\n",
    "action = \"analyze\"\n",
    "\n",
    "data = json.dumps({\"network_arch\": network_arch})\n",
    "\n",
    "endpoint = f\"{base_url}/experiments\"\n",
    "\n",
    "response = requests.post(endpoint, data=data, headers=headers)\n",
    "\n",
    "data_analytics_experiment_id = response.json()[\"id\"]\n",
    "print(data_analytics_experiment_id)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "scrolled": true,
    "tags": []
   },
   "outputs": [],
   "source": [
    "# List tasks\n",
    "endpoint = f\"{base_url}/experiments\"\n",
    "\n",
    "response = requests.get(endpoint, headers=headers)\n",
    "\n",
    "print(response)\n",
    "print(\"model id\\t\\t\\t     network architecture\")\n",
    "for rsp in response.json():\n",
    "    print(rsp[\"id\"],rsp[\"network_arch\"])"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Assign the dataset\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "scrolled": true,
    "tags": []
   },
   "outputs": [],
   "source": [
    "# Assign Dataset\n",
    "dataset_information = {\"inference_dataset\": kitti_dataset_id}\n",
    "\n",
    "data = json.dumps(dataset_information)\n",
    "\n",
    "endpoint = f\"{base_url}/experiments/{data_analytics_experiment_id}\"\n",
    "\n",
    "response = requests.patch(endpoint, data=data, headers=headers)\n",
    "\n",
    "print(response)\n",
    "print(response.json())"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Define the specs\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "scrolled": true,
    "tags": []
   },
   "outputs": [],
   "source": [
    "# Get default spec schema\n",
    "endpoint = f\"{base_url}/experiments/{data_analytics_experiment_id}/specs/{action}/schema\"\n",
    " \n",
    "response = requests.get(endpoint, headers=headers)\n",
    "\n",
    "print(response)\n",
    "analytics_analyze_specs = response.json()[\"default\"]\n",
    "print(json.dumps(analytics_analyze_specs, sort_keys=True, indent=4))"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Make changes to the specs if necessary\n",
    "print(json.dumps(analytics_analyze_specs, sort_keys=True, indent=4))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Execute the data analytics action\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "scrolled": true,
    "tags": []
   },
   "outputs": [],
   "source": [
    "# Run action\n",
    "parent = None\n",
    "\n",
    "data = json.dumps({\"parent_job_id\":parent, \"action\":action, \"specs\":analytics_analyze_specs})\n",
    "\n",
    "endpoint = f\"{base_url}/experiments/{data_analytics_experiment_id}/jobs\"\n",
    "\n",
    "response = requests.post(endpoint, data=data, headers=headers)\n",
    "\n",
    "print(response)\n",
    "print(response.json())\n",
    "\n",
    "job_map[action] = response.json()\n",
    "print(job_map)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "scrolled": true,
    "tags": []
   },
   "outputs": [],
   "source": [
    "# Monitor job status by repeatedly running this cell\n",
    "job_id = job_map[action]\n",
    "endpoint = f\"{base_url}/experiments/{data_analytics_experiment_id}/jobs/{job_id}\"\n",
    "\n",
    "while True:\n",
    "    clear_output(wait=True)\n",
    "    response = requests.get(endpoint, headers=headers)\n",
    "    \n",
    "    if \"error_desc\" in response.json().keys() and response.json()[\"error_desc\"] in (\"Job trying to retrieve not found\", \"No AutoML run found\"):\n",
    "        print(\"Job is being created\")\n",
    "        time.sleep(5)\n",
    "        continue\n",
    "    print(response)\n",
    "    print(json.dumps(response.json(), sort_keys=True, indent=4))\n",
    "       \n",
    "    if response.json().get(\"status\") in [\"Done\",\"Error\", \"Canceled\"] or response.status_code not in (200,201):\n",
    "        break\n",
    "    time.sleep(15)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "scrolled": true,
    "tags": []
   },
   "outputs": [],
   "source": [
    "# Download job contents\n",
    "job_id = job_map[action]\n",
    "\n",
    "endpoint = f\"{base_url}/experiments/{data_analytics_experiment_id}/jobs/{job_id}\"\n",
    "response = requests.get(endpoint, headers=headers)\n",
    "assert response.status_code in (200, 201)\n",
    "expected_file_size = response.json().get(\"job_tar_stats\", {}).get(\"file_size\")\n",
    "print(\"expected_file_size: \", expected_file_size)\n",
    "\n",
    "\n",
    "endpoint = f'{base_url}/experiments/{data_analytics_experiment_id}/jobs/{job_id}:download'\n",
    "# Save\n",
    "temptar = f'{job_id}.tar.gz'\n",
    "\n",
    "while True:\n",
    "    # Check if the file already exists\n",
    "    headers_download_job = dict(headers)\n",
    "    if os.path.exists(temptar):\n",
    "        # Get the current file size\n",
    "        file_size = os.path.getsize(temptar)\n",
    "\n",
    "        # If the file size matches the expected size, break out of the loop\n",
    "        if file_size >= expected_file_size:\n",
    "            print(\"Download completed successfully.\")\n",
    "            print(\"Untarring\")\n",
    "            # Untar to destination\n",
    "            tar_command = f'tar -xf {temptar} -C {workdir}/'\n",
    "            os.system(tar_command)\n",
    "            os.remove(temptar)\n",
    "            print(f\"Results at {workdir}/{job_id}\")\n",
    "            inference_out_path = f\"{workdir}/{job_id}\"\n",
    "            break\n",
    "\n",
    "        # Set the headers to resume the download from where it left off\n",
    "        headers_download_job['Range'] = f'bytes={file_size}-'\n",
    "    # Open the file for writing in binary mode\n",
    "    with open(temptar, 'ab') as f:\n",
    "        try:\n",
    "            response = requests.get(endpoint, headers=headers_download_job, stream=True)\n",
    "            # Check if the request was successful\n",
    "            if response.status_code in [200, 206]:\n",
    "                # Iterate over the content in chunks\n",
    "                for chunk in response.iter_content(chunk_size=1024):\n",
    "                    if chunk:\n",
    "                        # Write the chunk to the file\n",
    "                        f.write(chunk)\n",
    "                        # Flush and sync the file to disk\n",
    "                        f.flush()\n",
    "                        os.fsync(f.fileno())\n",
    "            else:\n",
    "                print(f\"Failed to download file. Status code: {response.status_code}\")\n",
    "        except requests.exceptions.RequestException as e:\n",
    "            print(\"Connection interrupted during download, resuming download from breaking point\")\n",
    "            time.sleep(5)  # Sleep for a while before retrying the request\n",
    "            continue  # Continue the loop to retry the request"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Delete model <a class=\"anchor\" id=\"head-21\"></a>"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "endpoint = f\"{base_url}/experiments/{annotation_conversion_experiment_id}\"\n",
    "\n",
    "response = requests.delete(endpoint,headers=headers)\n",
    "assert response.status_code in (200, 201)\n",
    "\n",
    "print(response)\n",
    "print(response.json())"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "endpoint = f\"{base_url}/experiments/{pseudo_mask_experiment_id}\"\n",
    "\n",
    "response = requests.delete(endpoint,headers=headers)\n",
    "assert response.status_code in (200, 201)\n",
    "\n",
    "print(response)\n",
    "print(response.json())"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "endpoint = f\"{base_url}/experiments/{data_aug_experiment_id}\"\n",
    "\n",
    "response = requests.delete(endpoint,headers=headers)\n",
    "assert response.status_code in (200, 201)\n",
    "\n",
    "print(response)\n",
    "print(response.json())"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "endpoint = f\"{base_url}/experiments/{data_analytics_experiment_id}\"\n",
    "\n",
    "response = requests.delete(endpoint,headers=headers)\n",
    "assert response.status_code in (200, 201)\n",
    "\n",
    "print(response)\n",
    "print(response.json())"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Delete dataset <a class=\"anchor\" id=\"head-21\"></a>"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### Delete kitti dataset <a class=\"anchor\" id=\"head-21\"></a>"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "endpoint = f\"{base_url}/datasets/{kitti_dataset_id}\"\n",
    "\n",
    "response = requests.delete(endpoint,headers=headers)\n",
    "assert response.status_code in (200, 201)\n",
    "\n",
    "print(response)\n",
    "print(response.json())"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### Delete coco dataset <a class=\"anchor\" id=\"head-21\"></a>"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "endpoint = f\"{base_url}/datasets/{coco_dataset_id}\"\n",
    "\n",
    "response = requests.delete(endpoint,headers=headers)\n",
    "assert response.status_code in (200, 201)\n",
    "\n",
    "print(response)\n",
    "print(response.json())"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### Delete coco augment dataset <a class=\"anchor\" id=\"head-21\"></a>"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "endpoint = f\"{base_url}/datasets/{coco_dataset_augment_id}\"\n",
    "\n",
    "response = requests.delete(endpoint,headers=headers)\n",
    "assert response.status_code in (200, 201)\n",
    "\n",
    "print(response)\n",
    "print(response.json())"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3 (ipykernel)",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.8.10"
  },
  "vscode": {
   "interpreter": {
    "hash": "31f2aee4e71d21fbe5cf8b01ff0e069b9275f58929596ceb00d14d90e3e16cd6"
   }
  }
 },
 "nbformat": 4,
 "nbformat_minor": 4
}
